Aminet 1

home *** CD-ROM | disk | FTP | other *** search

/ Aminet 1 / Aminet - June 1993 [Walnut Creek].iso / usenet / sources / volume90 / unix / flex_2_3 / part11 < prev next >

Wrap

Internet Message Format | 1990-08-19 | 68.3 KB

Path: abcfd20.larc.nasa.gov!amiga-request From: amiga-request@abcfd20.larc.nasa.gov (Amiga Sources/Binaries Moderator) Subject: v90i238: flex 2.3 - fast lexical analyzer generator, Part11/13 Reply-To: loftus@wpllabs.uucp (William P Loftus) Newsgroups: comp.sources.amiga Message-ID: <comp.sources.amiga:v90i238@abcfd20.larc.nasa.gov> Date: 19 Aug 90 22:43:21 GMT Approved: tadguy@uunet.UU.NET (Tad Guy) X-Mail-Submissions-To: amiga@uunet.uu.net X-Post-Discussions-To: comp.sys.amiga Submitted-by: loftus@wpllabs.uucp (William P Loftus) Posting-number: Volume 90, Issue 238 Archive-name: unix/flex-2.3/part11 #!/bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of archive 11 (of 13)." # Contents: flexdoc.1 # Wrapped by tadguy@abcfd20 on Sun Aug 19 18:41:49 1990 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'flexdoc.1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'flexdoc.1'\" else echo shar: Extracting \"'flexdoc.1'\" $65353 characters$ sed "s/^X//" >'flexdoc.1' <<'END_OF_FILE' X.TH FLEX 1 "26 May 1990" "Version 2.3" X.SH NAME Xflex - fast lexical analyzer generator X.SH SYNOPSIS X.B flex X.B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton] X.I [filename ...] X.SH DESCRIPTION X.I flex Xis a tool for generating X.I scanners: Xprograms which recognized lexical patterns in text. X.I flex Xreads Xthe given input files, or its standard input if no file names are given, Xfor a description of a scanner to generate. The description is in Xthe form of pairs Xof regular expressions and C code, called X.I rules. flex Xgenerates as output a C source file, X.B lex.yy.c, Xwhich defines a routine X.B yylex(). XThis file is compiled and linked with the X.B -lfl Xlibrary to produce an executable. When the executable is run, Xit analyzes its input for occurrences Xof the regular expressions. Whenever it finds one, it executes Xthe corresponding C code. X.SH SOME SIMPLE EXAMPLES X.LP XFirst some simple examples to get the flavor of how one uses X.I flex. XThe following X.I flex Xinput specifies a scanner which whenever it encounters the string X"username" will replace it with the user's login name: X.nf X X %% X username printf( "%s", getlogin() ); X X.fi XBy default, any text not matched by a X.I flex Xscanner Xis copied to the output, so the net effect of this scanner is Xto copy its input file to its output with each occurrence Xof "username" expanded. XIn this input, there is just one rule. "username" is the X.I pattern Xand the "printf" is the X.I action. XThe "%%" marks the beginning of the rules. X.LP XHere's another simple example: X.nf X X int num_lines = 0, num_chars = 0; X X %% X \\n ++num_lines; ++num_chars; X . ++num_chars; X X %% X main() X { X yylex(); X printf( "# of lines = %d, # of chars = %d\\n", X num_lines, num_chars ); X } X X.fi XThis scanner counts the number of characters and the number Xof lines in its input (it produces no output other than the Xfinal report on the counts). The first line Xdeclares two globals, "num_lines" and "num_chars", which are accessible Xboth inside X.B yylex() Xand in the X.B main() Xroutine declared after the second "%%". There are two rules, one Xwhich matches a newline ("\\n") and increments both the line count and Xthe character count, and one which matches any character other than Xa newline (indicated by the "." regular expression). X.LP XA somewhat more complicated example: X.nf X X /* scanner for a toy Pascal-like language */ X X %{ X /* need this for the call to atof() below */ X #include <math.h> X %} X X DIGIT [0-9] X ID [a-z][a-z0-9]* X X %% X X {DIGIT}+ { X printf( "An integer: %s (%d)\\n", yytext, X atoi( yytext ) ); X } X X {DIGIT}+"."{DIGIT}* { X printf( "A float: %s (%g)\\n", yytext, X atof( yytext ) ); X } X X if|then|begin|end|procedure|function { X printf( "A keyword: %s\\n", yytext ); X } X X {ID} printf( "An identifier: %s\\n", yytext ); X X "+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext ); X X "{"[^}\\n]*"}" /* eat up one-line comments */ X X [ \\t\\n]+ /* eat up whitespace */ X X . printf( "Unrecognized character: %s\\n", yytext ); X X %% X X main( argc, argv ) X int argc; X char **argv; X { X ++argv, --argc; /* skip over program name */ X if ( argc > 0 ) X yyin = fopen( argv[0], "r" ); X else X yyin = stdin; X X yylex(); X } X X.fi XThis is the beginnings of a simple scanner for a language like XPascal. It identifies different types of X.I tokens Xand reports on what it has seen. X.LP XThe details of this example will be explained in the following Xsections. X.SH FORMAT OF THE INPUT FILE XThe X.I flex Xinput file consists of three sections, separated by a line with just X.B %% Xin it: X.nf X X definitions X %% X rules X %% X user code X X.fi XThe X.I definitions Xsection contains declarations of simple X.I name Xdefinitions to simplify the scanner specification, and declarations of X.I start conditions, Xwhich are explained in a later section. X.LP XName definitions have the form: X.nf X X name definition X X.fi XThe "name" is a word beginning with a letter or an underscore ('_') Xfollowed by zero or more letters, digits, '_', or '-' (dash). XThe definition is taken to begin at the first non-white-space character Xfollowing the name and continuing to the end of the line. XThe definition can subsequently be referred to using "{name}", which Xwill expand to "(definition)". For example, X.nf X X DIGIT [0-9] X ID [a-z][a-z0-9]* X X.fi Xdefines "DIGIT" to be a regular expression which matches a Xsingle digit, and X"ID" to be a regular expression which matches a letter Xfollowed by zero-or-more letters-or-digits. XA subsequent reference to X.nf X X {DIGIT}+"."{DIGIT}* X X.fi Xis identical to X.nf X X ([0-9])+"."([0-9])* X X.fi Xand matches one-or-more digits followed by a '.' followed Xby zero-or-more digits. X.LP XThe X.I rules Xsection of the X.I flex Xinput contains a series of rules of the form: X.nf X X pattern action X X.fi Xwhere the pattern must be unindented and the action must begin Xon the same line. X.LP XSee below for a further description of patterns and actions. X.LP XFinally, the user code section is simply copied to X.B lex.yy.c Xverbatim. XIt is used for companion routines which call or are called Xby the scanner. The presence of this section is optional; Xif it is missing, the second X.B %% Xin the input file may be skipped, too. X.LP XIn the definitions and rules sections, any X.I indented Xtext or text enclosed in X.B %{ Xand X.B %} Xis copied verbatim to the output (with the %{}'s removed). XThe %{}'s must appear unindented on lines by themselves. X.LP XIn the rules section, Xany indented or %{} text appearing before the Xfirst rule may be used to declare variables Xwhich are local to the scanning routine and (after the declarations) Xcode which is to be executed whenever the scanning routine is entered. XOther indented or %{} text in the rule section is still copied to the output, Xbut its meaning is not well-defined and it may well cause compile-time Xerrors (this feature is present for X.I POSIX Xcompliance; see below for other such features). X.LP XIn the definitions section, an unindented comment (i.e., a line Xbeginning with "/*") is also copied verbatim to the output up Xto the next "*/". Also, any line in the definitions section Xbeginning with '#' is ignored, though this style of comment is Xdeprecated and may go away in the future. X.SH PATTERNS XThe patterns in the input are written using an extended set of regular Xexpressions. These are: X.nf X X x match the character 'x' X . any character except newline X [xyz] a "character class"; in this case, the pattern X matches either an 'x', a 'y', or a 'z' X [abj-oZ] a "character class" with a range in it; matches X an 'a', a 'b', any letter from 'j' through 'o', X or a 'Z' X [^A-Z] a "negated character class", i.e., any character X but those in the class. In this case, any X character EXCEPT an uppercase letter. X [^A-Z\\n] any character EXCEPT an uppercase letter or X a newline X r* zero or more r's, where r is any regular expression X r+ one or more r's X r? zero or one r's (that is, "an optional r") X r{2,5} anywhere from two to five r's X r{2,} two or more r's X r{4} exactly 4 r's X {name} the expansion of the "name" definition X (see above) X "[xyz]\\"foo" X the literal string: [xyz]"foo X \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', X then the ANSI-C interpretation of \\x. X Otherwise, a literal 'X' (used to escape X operators such as '*') X \\123 the character with octal value 123 X \\x2a the character with hexadecimal value 2a X (r) match an r; parentheses are used to override X precedence (see below) X X X rs the regular expression r followed by the X regular expression s; called "concatenation" X X X r|s either an r or an s X X X r/s an r but only if it is followed by an s. The X s is not part of the matched text. This type X of pattern is called as "trailing context". X ^r an r, but only at the beginning of a line X r$ an r, but only at the end of a line. Equivalent X to "r/\\n". X X X <s>r an r, but only in start condition s (see X below for discussion of start conditions) X <s1,s2,s3>r X same, but in any of start conditions s1, X s2, or s3 X X X <<EOF>> an end-of-file X <s1,s2><<EOF>> X an end-of-file when in start condition s1 or s2 X X.fi XThe regular expressions listed above are grouped according to Xprecedence, from highest precedence at the top to lowest at the bottom. XThose grouped together have equal precedence. For example, X.nf X X foo|bar* X X.fi Xis the same as X.nf X X (foo)|(ba(r*)) X X.fi Xsince the '*' operator has higher precedence than concatenation, Xand concatenation higher than alternation ('|'). This pattern Xtherefore matches X.I either Xthe string "foo" X.I or Xthe string "ba" followed by zero-or-more r's. XTo match "foo" or zero-or-more "bar"'s, use: X.nf X X foo|(bar)* X X.fi Xand to match zero-or-more "foo"'s-or-"bar"'s: X.nf X X (foo|bar)* X X.fi X.LP XSome notes on patterns: X.IP - XA negated character class such as the example "[^A-Z]" Xabove X.I will match a newline Xunless "\\n" (or an equivalent escape sequence) is one of the Xcharacters explicitly present in the negated character class X(e.g., "[^A-Z\\n]"). This is unlike how many other regular Xexpression tools treat negated character classes, but unfortunately Xthe inconsistency is historically entrenched. XMatching newlines means that a pattern like [^"]* can match an entire Xinput (overflowing the scanner's input buffer) unless there's another Xquote in the input. X.IP - XA rule can have at most one instance of trailing context (the '/' operator Xor the '$' operator). The start condition, '^', and "<<EOF>>" patterns Xcan only occur at the beginning of a pattern, and, as well as with '/' and '$', Xcannot be grouped inside parentheses. A '^' which does not occur at Xthe beginning of a rule or a '$' which does not occur at the end of Xa rule loses its special properties and is treated as a normal character. X.IP XThe following are illegal: X.nf X X foo/bar$ X <sc1>foo<sc2>bar X X.fi XNote that the first of these, can be written "foo/bar\\n". X.IP XThe following will result in '$' or '^' being treated as a normal character: X.nf X X foo|(bar$) X foo|^bar X X.fi XIf what's wanted is a "foo" or a bar-followed-by-a-newline, the following Xcould be used (the special '|' action is explained below): X.nf X X foo | X bar$ /* action goes here */ X X.fi XA similar trick will work for matching a foo or a Xbar-at-the-beginning-of-a-line. X.SH HOW THE INPUT IS MATCHED XWhen the generated scanner is run, it analyzes its input looking Xfor strings which match any of its patterns. If it finds more than Xone match, it takes the one matching the most text (for trailing Xcontext rules, this includes the length of the trailing part, even Xthough it will then be returned to the input). If it finds two Xor more matches of the same length, the Xrule listed first in the X.I flex Xinput file is chosen. X.LP XOnce the match is determined, the text corresponding to the match X(called the X.I token) Xis made available in the global character pointer X.B yytext, Xand its length in the global integer X.B yyleng. XThe X.I action Xcorresponding to the matched pattern is then executed (a more Xdetailed description of actions follows), and then the remaining Xinput is scanned for another match. X.LP XIf no match is found, then the X.I default rule Xis executed: the next character in the input is considered matched and Xcopied to the standard output. Thus, the simplest legal X.I flex Xinput is: X.nf X X %% X X.fi Xwhich generates a scanner that simply copies its input (one character Xat a time) to its output. X.SH ACTIONS XEach pattern in a rule has a corresponding action, which can be any Xarbitrary C statement. The pattern ends at the first non-escaped Xwhitespace character; the remainder of the line is its action. If the Xaction is empty, then when the pattern is matched the input token Xis simply discarded. For example, here is the specification for a program Xwhich deletes all occurrences of "zap me" from its input: X.nf X X %% X "zap me" X X.fi X(It will copy all other characters in the input to the output since Xthey will be matched by the default rule.) X.LP XHere is a program which compresses multiple blanks and tabs down to Xa single blank, and throws away whitespace found at the end of a line: X.nf X X %% X [ \\t]+ putchar( ' ' ); X [ \\t]+$ /* ignore this token */ X X.fi X.LP XIf the action contains a '{', then the action spans till the balancing '}' Xis found, and the action may cross multiple lines. X.I flex Xknows about C strings and comments and won't be fooled by braces found Xwithin them, but also allows actions to begin with X.B %{ Xand will consider the action to be all the text up to the next X.B %} X(regardless of ordinary braces inside the action). X.LP XAn action consisting solely of a vertical bar ('|') means "same as Xthe action for the next rule." See below for an illustration. X.LP XActions can include arbitrary C code, including X.B return Xstatements to return a value to whatever routine called X.B yylex(). XEach time X.B yylex() Xis called it continues processing tokens from where it last left Xoff until it either reaches Xthe end of the file or executes a return. Once it reaches an end-of-file, Xhowever, then any subsequent call to X.B yylex() Xwill simply immediately return, unless X.B yyrestart() Xis first called (see below). X.LP XActions are not allowed to modify yytext or yyleng. X.LP XThere are a number of special directives which can be included within Xan action: X.IP - X.B ECHO Xcopies yytext to the scanner's output. X.IP - X.B BEGIN Xfollowed by the name of a start condition places the scanner in the Xcorresponding start condition (see below). X.IP - X.B REJECT Xdirects the scanner to proceed on to the "second best" rule which matched the Xinput (or a prefix of the input). The rule is chosen as described Xabove in "How the Input is Matched", and X.B yytext Xand X.B yyleng Xset up appropriately. XIt may either be one which matched as much text Xas the originally chosen rule but came later in the X.I flex Xinput file, or one which matched less text. XFor example, the following will both count the Xwords in the input and call the routine special() whenever "frob" is seen: X.nf X X int word_count = 0; X %% X X frob special(); REJECT; X [^ \\t\\n]+ ++word_count; X X.fi XWithout the X.B REJECT, Xany "frob"'s in the input would not be counted as words, since the Xscanner normally executes only one action per token. XMultiple X.B REJECT's Xare allowed, each one finding the next best choice to the currently Xactive rule. For example, when the following scanner scans the token X"abcd", it will write "abcdabcaba" to the output: X.nf X X %% X a | X ab | X abc | X abcd ECHO; REJECT; X .|\\n /* eat up any unmatched character */ X X.fi X(The first three rules share the fourth's action since they use Xthe special '|' action.) X.B REJECT Xis a particularly expensive feature in terms scanner performance; Xif it is used in X.I any Xof the scanner's actions it will slow down X.I all Xof the scanner's matching. Furthermore, X.B REJECT Xcannot be used with the X.I -f Xor X.I -F Xoptions (see below). X.IP XNote also that unlike the other special actions, X.B REJECT Xis a X.I branch; Xcode immediately following it in the action will X.I not Xbe executed. X.IP - X.B yymore() Xtells the scanner that the next time it matches a rule, the corresponding Xtoken should be X.I appended Xonto the current value of X.B yytext Xrather than replacing it. For example, given the input "mega-kludge" Xthe following will write "mega-mega-kludge" to the output: X.nf X X %% X mega- ECHO; yymore(); X kludge ECHO; X X.fi XFirst "mega-" is matched and echoed to the output. Then "kludge" Xis matched, but the previous "mega-" is still hanging around at the Xbeginning of X.B yytext Xso the X.B ECHO Xfor the "kludge" rule will actually write "mega-kludge". XThe presence of X.B yymore() Xin the scanner's action entails a minor performance penalty in the Xscanner's matching speed. X.IP - X.B yyless(n) Xreturns all but the first X.I n Xcharacters of the current token back to the input stream, where they Xwill be rescanned when the scanner looks for the next match. X.B yytext Xand X.B yyleng Xare adjusted appropriately (e.g., X.B yyleng Xwill now be equal to X.I n X). For example, on the input "foobar" the following will write out X"foobarbar": X.nf X X %% X foobar ECHO; yyless(3); X [a-z]+ ECHO; X X.fi XAn argument of 0 to X.B yyless Xwill cause the entire current input string to be scanned again. Unless you've Xchanged how the scanner will subsequently process its input (using X.B BEGIN, Xfor example), this will result in an endless loop. X.IP - X.B unput(c) Xputs the character X.I c Xback onto the input stream. It will be the next character scanned. XThe following action will take the current token and cause it Xto be rescanned enclosed in parentheses. X.nf X X { X int i; X unput( ')' ); X for ( i = yyleng - 1; i >= 0; --i ) X unput( yytext[i] ); X unput( '(' ); X } X X.fi XNote that since each X.B unput() Xputs the given character back at the X.I beginning Xof the input stream, pushing back strings must be done back-to-front. X.IP - X.B input() Xreads the next character from the input stream. For example, Xthe following is one way to eat up C comments: X.nf X X %% X "/*" { X register int c; X X for ( ; ; ) X { X while ( (c = input()) != '*' && X c != EOF ) X ; /* eat up text of comment */ X X if ( c == '*' ) X { X while ( (c = input()) == '*' ) X ; X if ( c == '/' ) X break; /* found the end */ X } X X if ( c == EOF ) X { X error( "EOF in comment" ); X break; X } X } X } X X.fi X(Note that if the scanner is compiled using X.B C++, Xthen X.B input() Xis instead referred to as X.B yyinput(), Xin order to avoid a name clash with the X.B C++ Xstream by the name of X.I input.) X.IP - X.B yyterminate() Xcan be used in lieu of a return statement in an action. It terminates Xthe scanner and returns a 0 to the scanner's caller, indicating "all done". XSubsequent calls to the scanner will immediately return unless preceded Xby a call to X.B yyrestart() X(see below). XBy default, X.B yyterminate() Xis also called when an end-of-file is encountered. It is a macro and Xmay be redefined. X.SH THE GENERATED SCANNER XThe output of X.I flex Xis the file X.B lex.yy.c, Xwhich contains the scanning routine X.B yylex(), Xa number of tables used by it for matching tokens, and a number Xof auxiliary routines and macros. By default, X.B yylex() Xis declared as follows: X.nf X X int yylex() X { X ... various definitions and the actions in here ... X } X X.fi X(If your environment supports function prototypes, then it will Xbe "int yylex( void )".) This definition may be changed by redefining Xthe "YY_DECL" macro. For example, you could use: X.nf X X #undef YY_DECL X #define YY_DECL float lexscan( a, b ) float a, b; X X.fi Xto give the scanning routine the name X.I lexscan, Xreturning a float, and taking two floats as arguments. Note that Xif you give arguments to the scanning routine using a XK&R-style/non-prototyped function declaration, you must terminate Xthe definition with a semi-colon (;). X.LP XWhenever X.B yylex() Xis called, it scans tokens from the global input file X.I yyin X(which defaults to stdin). It continues until it either reaches Xan end-of-file (at which point it returns the value 0) or Xone of its actions executes a X.I return Xstatement. XIn the former case, when called again the scanner will immediately Xreturn unless X.B yyrestart() Xis called to point X.I yyin Xat the new input file. ( X.B yyrestart() Xtakes one argument, a X.B FILE * Xpointer.) XIn the latter case (i.e., when an action Xexecutes a return), the scanner may then be called again and it Xwill resume scanning where it left off. X.LP XBy default (and for purposes of efficiency), the scanner uses Xblock-reads rather than simple X.I getc() Xcalls to read characters from X.I yyin. XThe nature of how it gets its input can be controlled by redefining the X.B YY_INPUT Xmacro. XYY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its Xaction is to place up to X.I max_size Xcharacters in the character array X.I buf Xand return in the integer variable X.I result Xeither the Xnumber of characters read or the constant YY_NULL (0 on Unix systems) Xto indicate EOF. The default YY_INPUT reads from the Xglobal file-pointer "yyin". X.LP XA sample redefinition of YY_INPUT (in the definitions Xsection of the input file): X.nf X X %{ X #undef YY_INPUT X #define YY_INPUT(buf,result,max_size) \\ X { \\ X int c = getchar(); \\ X result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\ X } X %} X X.fi XThis definition will change the input processing to occur Xone character at a time. X.LP XYou also can add in things like keeping track of the Xinput line number this way; but don't expect your scanner to Xgo very fast. X.LP XWhen the scanner receives an end-of-file indication from YY_INPUT, Xit then checks the X.B yywrap() Xfunction. If X.B yywrap() Xreturns false (zero), then it is assumed that the Xfunction has gone ahead and set up X.I yyin Xto point to another input file, and scanning continues. If it returns Xtrue (non-zero), then the scanner terminates, returning 0 to its Xcaller. X.LP XThe default X.B yywrap() Xalways returns 1. Presently, to redefine it you must first X"#undef yywrap", as it is currently implemented as a macro. As indicated Xby the hedging in the previous sentence, it may be changed to Xa true function in the near future. X.LP XThe scanner writes its X.B ECHO Xoutput to the X.I yyout Xglobal (default, stdout), which may be redefined by the user simply Xby assigning it to some other X.B FILE Xpointer. X.SH START CONDITIONS X.I flex Xprovides a mechanism for conditionally activating rules. Any rule Xwhose pattern is prefixed with "<sc>" will only be active when Xthe scanner is in the start condition named "sc". For example, X.nf X X <STRING>[^"]* { /* eat up the string body ... */ X ... X } X X.fi Xwill be active only when the scanner is in the "STRING" start Xcondition, and X.nf X X <INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */ X ... X } X X.fi Xwill be active only when the current start condition is Xeither "INITIAL", "STRING", or "QUOTE". X.LP XStart conditions Xare declared in the definitions (first) section of the input Xusing unindented lines beginning with either X.B %s Xor X.B %x Xfollowed by a list of names. XThe former declares X.I inclusive Xstart conditions, the latter X.I exclusive Xstart conditions. A start condition is activated using the X.B BEGIN Xaction. Until the next X.B BEGIN Xaction is executed, rules with the given start Xcondition will be active and Xrules with other start conditions will be inactive. XIf the start condition is X.I inclusive, Xthen rules with no start conditions at all will also be active. XIf it is X.I exclusive, Xthen X.I only Xrules qualified with the start condition will be active. XA set of rules contingent on the same exclusive start condition Xdescribe a scanner which is independent of any of the other rules in the X.I flex Xinput. Because of this, Xexclusive start conditions make it easy to specify "mini-scanners" Xwhich scan portions of the input that are syntactically different Xfrom the rest (e.g., comments). X.LP XIf the distinction between inclusive and exclusive start conditions Xis still a little vague, here's a simple example illustrating the Xconnection between the two. The set of rules: X.nf X X %s example X %% X <example>foo /* do something */ X X.fi Xis equivalent to X.nf X X %x example X %% X <INITIAL,example>foo /* do something */ X X.fi X.LP XThe default rule (to X.B ECHO Xany unmatched character) remains active in start conditions. X.LP X.B BEGIN(0) Xreturns to the original state where only the rules with Xno start conditions are active. This state can also be Xreferred to as the start-condition "INITIAL", so X.B BEGIN(INITIAL) Xis equivalent to X.B BEGIN(0). X(The parentheses around the start condition name are not required but Xare considered good style.) X.LP X.B BEGIN Xactions can also be given as indented code at the beginning Xof the rules section. For example, the following will cause Xthe scanner to enter the "SPECIAL" start condition whenever X.I yylex() Xis called and the global variable X.I enter_special Xis true: X.nf X X int enter_special; X X %x SPECIAL X %% X if ( enter_special ) X BEGIN(SPECIAL); X X <SPECIAL>blahblahblah X ...more rules follow... X X.fi X.LP XTo illustrate the uses of start conditions, Xhere is a scanner which provides two different interpretations Xof a string like "123.456". By default it will treat it as Xas three tokens, the integer "123", a dot ('.'), and the integer "456". XBut if the string is preceded earlier in the line by the string X"expect-floats" Xit will treat it as a single token, the floating-point number X123.456: X.nf X X %{ X #include <math.h> X %} X %s expect X X %% X expect-floats BEGIN(expect); X X <expect>[0-9]+"."[0-9]+ { X printf( "found a float, = %f\\n", X atof( yytext ) ); X } X <expect>\\n { X /* that's the end of the line, so X * we need another "expect-number" X * before we'll recognize any more X * numbers X */ X BEGIN(INITIAL); X } X X [0-9]+ { X printf( "found an integer, = %d\\n", X atoi( yytext ) ); X } X X "." printf( "found a dot\\n" ); X X.fi XHere is a scanner which recognizes (and discards) C comments while Xmaintaining a count of the current input line. X.nf X X %x comment X %% X int line_num = 1; X X "/*" BEGIN(comment); X X <comment>[^*\\n]* /* eat anything that's not a '*' */ X <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */ X <comment>\\n ++line_num; X <comment>"*"+"/" BEGIN(INITIAL); X X.fi XNote that start-conditions names are really integer values and Xcan be stored as such. Thus, the above could be extended in the Xfollowing fashion: X.nf X X %x comment foo X %% X int line_num = 1; X int comment_caller; X X "/*" { X comment_caller = INITIAL; X BEGIN(comment); X } X X ... X X <foo>"/*" { X comment_caller = foo; X BEGIN(comment); X } X X <comment>[^*\\n]* /* eat anything that's not a '*' */ X <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */ X <comment>\\n ++line_num; X <comment>"*"+"/" BEGIN(comment_caller); X X.fi XOne can then implement a "stack" of start conditions using an Xarray of integers. (It is likely that such stacks will become Xa full-fledged X.I flex Xfeature in the future.) Note, though, that Xstart conditions do not have their own name-space; %s's and %x's Xdeclare names in the same fashion as #define's. X.SH MULTIPLE INPUT BUFFERS XSome scanners (such as those which support "include" files) Xrequire reading from several input streams. As X.I flex Xscanners do a large amount of buffering, one cannot control Xwhere the next input will be read from by simply writing a X.B YY_INPUT Xwhich is sensitive to the scanning context. X.B YY_INPUT Xis only called when the scanner reaches the end of its buffer, which Xmay be a long time after scanning a statement such as an "include" Xwhich requires switching the input source. X.LP XTo negotiate these sorts of problems, X.I flex Xprovides a mechanism for creating and switching between multiple Xinput buffers. An input buffer is created by using: X.nf X X YY_BUFFER_STATE yy_create_buffer( FILE *file, int size ) X X.fi Xwhich takes a X.I FILE Xpointer and a size and creates a buffer associated with the given Xfile and large enough to hold X.I size Xcharacters (when in doubt, use X.B YY_BUF_SIZE Xfor the size). It returns a X.B YY_BUFFER_STATE Xhandle, which may then be passed to other routines: X.nf X X void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer ) X X.fi Xswitches the scanner's input buffer so subsequent tokens will Xcome from X.I new_buffer. XNote that X.B yy_switch_to_buffer() Xmay be used by yywrap() to sets things up for continued scanning, instead Xof opening a new file and pointing X.I yyin Xat it. X.nf X X void yy_delete_buffer( YY_BUFFER_STATE buffer ) X X.fi Xis used to reclaim the storage associated with a buffer. X.LP X.B yy_new_buffer() Xis an alias for X.B yy_create_buffer(), Xprovided for compatibility with the C++ use of X.I new Xand X.I delete Xfor creating and destroying dynamic objects. X.LP XFinally, the X.B YY_CURRENT_BUFFER Xmacro returns a X.B YY_BUFFER_STATE Xhandle to the current buffer. X.LP XHere is an example of using these features for writing a scanner Xwhich expands include files (the X.B <<EOF>> Xfeature is discussed below): X.nf X X /* the "incl" state is used for picking up the name X * of an include file X */ X %x incl X X %{ X #define MAX_INCLUDE_DEPTH 10 X YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; X int include_stack_ptr = 0; X %} X X %% X include BEGIN(incl); X X [a-z]+ ECHO; X [^a-z\\n]*\\n? ECHO; X X <incl>[ \\t]* /* eat the whitespace */ X <incl>[^ \\t\\n]+ { /* got the include file name */ X if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) X { X fprintf( stderr, "Includes nested too deeply" ); X exit( 1 ); X } X X include_stack[include_stack_ptr++] = X YY_CURRENT_BUFFER; X X yyin = fopen( yytext, "r" ); X X if ( ! yyin ) X error( ... ); X X yy_switch_to_buffer( X yy_create_buffer( yyin, YY_BUF_SIZE ) ); X X BEGIN(INITIAL); X } X X <<EOF>> { X if ( --include_stack_ptr < 0 ) X { X yyterminate(); X } X X else X yy_switch_to_buffer( X include_stack[include_stack_ptr] ); X } X X.fi X.SH END-OF-FILE RULES XThe special rule "<<EOF>>" indicates Xactions which are to be taken when an end-of-file is Xencountered and yywrap() returns non-zero (i.e., indicates Xno further files to process). The action must finish Xby doing one of four things: X.IP - Xthe special X.B YY_NEW_FILE Xaction, if X.I yyin Xhas been pointed at a new file to process; X.IP - Xa X.I return Xstatement; X.IP - Xthe special X.B yyterminate() Xaction; X.IP - Xor, switching to a new buffer using X.B yy_switch_to_buffer() Xas shown in the example above. X.LP X<<EOF>> rules may not be used with other Xpatterns; they may only be qualified with a list of start Xconditions. If an unqualified <<EOF>> rule is given, it Xapplies to X.I all Xstart conditions which do not already have <<EOF>> actions. To Xspecify an <<EOF>> rule for only the initial start condition, use X.nf X X <INITIAL><<EOF>> X X.fi X.LP XThese rules are useful for catching things like unclosed comments. XAn example: X.nf X X %x quote X %% X X ...other rules for dealing with quotes... X X <quote><<EOF>> { X error( "unterminated quote" ); X yyterminate(); X } X <<EOF>> { X if ( *++filelist ) X { X yyin = fopen( *filelist, "r" ); X YY_NEW_FILE; X } X else X yyterminate(); X } X X.fi X.SH MISCELLANEOUS MACROS XThe macro X.bd XYY_USER_ACTION Xcan be redefined to provide an action Xwhich is always executed prior to the matched rule's action. For example, Xit could be #define'd to call a routine to convert yytext to lower-case. X.LP XThe macro X.B YY_USER_INIT Xmay be redefined to provide an action which is always executed before Xthe first scan (and before the scanner's internal initializations are done). XFor example, it could be used to call a routine to read Xin a data table or open a logging file. X.LP XIn the generated scanner, the actions are all gathered in one large Xswitch statement and separated using X.B YY_BREAK, Xwhich may be redefined. By default, it is simply a "break", to separate Xeach rule's action from the following rule's. XRedefining X.B YY_BREAK Xallows, for example, C++ users to X#define YY_BREAK to do nothing (while being very careful that every Xrule ends with a "break" or a "return"!) to avoid suffering from Xunreachable statement warnings where because a rule's action ends with X"return", the X.B YY_BREAK Xis inaccessible. X.SH INTERFACING WITH YACC XOne of the main uses of X.I flex Xis as a companion to the X.I yacc Xparser-generator. X.I yacc Xparsers expect to call a routine named X.B yylex() Xto find the next input token. The routine is supposed to Xreturn the type of the next token as well as putting any associated Xvalue in the global X.B yylval. XTo use X.I flex Xwith X.I yacc, Xone specifies the X.B -d Xoption to X.I yacc Xto instruct it to generate the file X.B y.tab.h Xcontaining definitions of all the X.B %tokens Xappearing in the X.I yacc Xinput. This file is then included in the X.I flex Xscanner. For example, if one of the tokens is "TOK_NUMBER", Xpart of the scanner might look like: X.nf X X %{ X #include "y.tab.h" X %} X X %% X X [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER; X X.fi X.SH TRANSLATION TABLE XIn the name of POSIX compliance, X.I flex Xsupports a X.I translation table Xfor mapping input characters into groups. XThe table is specified in the first section, and its format looks like: X.nf X X %t X 1 abcd X 2 ABCDEFGHIJKLMNOPQRSTUVWXYZ X 52 0123456789 X 6 \\t\\ \\n X %t X X.fi XThis example specifies that the characters 'a', 'b', 'c', and 'd' Xare to all be lumped into group #1, upper-case letters Xin group #2, digits in group #52, tabs, blanks, and newlines into Xgroup #6, and X.I Xno other characters will appear in the patterns. XThe group numbers are actually disregarded by X.I flex; X.B %t Xserves, though, to lump characters together. Given the above Xtable, for example, the pattern "a(AA)*5" is equivalent to "d(ZQ)*0". XThey both say, "match any character in group #1, followed by Xzero-or-more pairs of characters Xfrom group #2, followed by a character from group #52." Thus X.B %t Xprovides a crude way for introducing equivalence classes into Xthe scanner specification. X.LP XNote that the X.B -i Xoption (see below) coupled with the equivalence classes which X.I flex Xautomatically generates take care of virtually all the instances Xwhen one might consider using X.B %t. XBut what the hell, it's there if you want it. X.SH OPTIONS X.I flex Xhas the following options: X.TP X.B -b XGenerate backtracking information to X.I lex.backtrack. XThis is a list of scanner states which require backtracking Xand the input characters on which they do so. By adding rules one Xcan remove backtracking states. If all backtracking states Xare eliminated and X.B -f Xor X.B -F Xis used, the generated scanner will run faster (see the X.B -p Xflag). Only users who wish to squeeze every last cycle out of their Xscanners need worry about this option. (See the section on PERFORMANCE XCONSIDERATIONS below.) X.TP X.B -c Xis a do-nothing, deprecated option included for POSIX compliance. X.IP X.B NOTE: Xin previous releases of X.I flex X.B -c Xspecified table-compression options. This functionality is Xnow given by the X.B -C Xflag. To ease the the impact of this change, when X.I flex Xencounters X.B -c, Xit currently issues a warning message and assumes that X.B -C Xwas desired instead. In the future this "promotion" of X.B -c Xto X.B -C Xwill go away in the name of full POSIX compliance (unless Xthe POSIX meaning is removed first). X.TP X.B -d Xmakes the generated scanner run in X.I debug Xmode. Whenever a pattern is recognized and the global X.B yy_flex_debug Xis non-zero (which is the default), Xthe scanner will write to X.I stderr Xa line of the form: X.nf X X --accepting rule at line 53 ("the matched text") X X.fi XThe line number refers to the location of the rule in the file Xdefining the scanner (i.e., the file that was fed to flex). Messages Xare also generated when the scanner backtracks, accepts the Xdefault rule, reaches the end of its input buffer (or encounters Xa NUL; at this point, the two look the same as far as the scanner's concerned), Xor reaches an end-of-file. X.TP X.B -f Xspecifies (take your pick) X.I full table Xor X.I fast scanner. XNo table compression is done. The result is large but fast. XThis option is equivalent to X.B -Cf X(see below). X.TP X.B -i Xinstructs X.I flex Xto generate a X.I case-insensitive Xscanner. The case of letters given in the X.I flex Xinput patterns will Xbe ignored, and tokens in the input will be matched regardless of case. The Xmatched text given in X.I yytext Xwill have the preserved case (i.e., it will not be folded). X.TP X.B -n Xis another do-nothing, deprecated option included only for XPOSIX compliance. X.TP X.B -p Xgenerates a performance report to stderr. The report Xconsists of comments regarding features of the X.I flex Xinput file which will cause a loss of performance in the resulting scanner. XNote that the use of X.I REJECT Xand variable trailing context (see the BUGS section in flex(1)) Xentails a substantial performance penalty; use of X.I yymore(), Xthe X.B ^ Xoperator, Xand the X.B -I Xflag entail minor performance penalties. X.TP X.B -s Xcauses the X.I default rule X(that unmatched scanner input is echoed to X.I stdout) Xto be suppressed. If the scanner encounters input that does not Xmatch any of its rules, it aborts with an error. This option is Xuseful for finding holes in a scanner's rule set. X.TP X.B -t Xinstructs X.I flex Xto write the scanner it generates to standard output instead Xof X.B lex.yy.c. X.TP X.B -v Xspecifies that X.I flex Xshould write to X.I stderr Xa summary of statistics regarding the scanner it generates. XMost of the statistics are meaningless to the casual X.I flex Xuser, but the Xfirst line identifies the version of X.I flex, Xwhich is useful for figuring Xout where you stand with respect to patches and new releases, Xand the next two lines give the date when the scanner was created Xand a summary of the flags which were in effect. X.TP X.B -F Xspecifies that the X.ul Xfast Xscanner table representation should be used. This representation is Xabout as fast as the full table representation X.ul X(-f), Xand for some sets of patterns will be considerably smaller (and for Xothers, larger). In general, if the pattern set contains both "keywords" Xand a catch-all, "identifier" rule, such as in the set: X.nf X X "case" return TOK_CASE; X "switch" return TOK_SWITCH; X ... X "default" return TOK_DEFAULT; X [a-z]+ return TOK_ID; X X.fi Xthen you're better off using the full table representation. If only Xthe "identifier" rule is present and you then use a hash table or some such Xto detect the keywords, you're better off using X.ul X-F. X.IP XThis option is equivalent to X.B -CF X(see below). X.TP X.B -I Xinstructs X.I flex Xto generate an X.I interactive Xscanner. Normally, scanners generated by X.I flex Xalways look ahead one Xcharacter before deciding that a rule has been matched. At the cost of Xsome scanning overhead, X.I flex Xwill generate a scanner which only looks ahead Xwhen needed. Such scanners are called X.I interactive Xbecause if you want to write a scanner for an interactive system such as a Xcommand shell, you will probably want the user's input to be terminated Xwith a newline, and without X.B -I Xthe user will have to type a character in addition to the newline in order Xto have the newline recognized. This leads to dreadful interactive Xperformance. X.IP XIf all this seems to confusing, here's the general rule: if a human will Xbe typing in input to your scanner, use X.B -I, Xotherwise don't; if you don't care about squeezing the utmost performance Xfrom your scanner and you Xdon't want to make any assumptions about the input to your scanner, Xuse X.B -I. X.IP XNote, X.B -I Xcannot be used in conjunction with X.I full Xor X.I fast tables, Xi.e., the X.B -f, -F, -Cf, Xor X.B -CF Xflags. X.TP X.B -L Xinstructs X.I flex Xnot to generate X.B #line Xdirectives. Without this option, X.I flex Xpeppers the generated scanner Xwith #line directives so error messages in the actions will be correctly Xlocated with respect to the original X.I flex Xinput file, and not to Xthe fairly meaningless line numbers of X.B lex.yy.c. X(Unfortunately X.I flex Xdoes not presently generate the necessary directives Xto "retarget" the line numbers for those parts of X.B lex.yy.c Xwhich it generated. So if there is an error in the generated code, Xa meaningless line number is reported.) X.TP X.B -T Xmakes X.I flex Xrun in X.I trace Xmode. It will generate a lot of messages to X.I stdout Xconcerning Xthe form of the input and the resultant non-deterministic and deterministic Xfinite automata. This option is mostly for use in maintaining X.I flex. X.TP X.B -8 Xinstructs X.I flex Xto generate an 8-bit scanner, i.e., one which can recognize 8-bit Xcharacters. On some sites, X.I flex Xis installed with this option as the default. On others, the default Xis 7-bit characters. To see which is the case, check the verbose X.B (-v) Xoutput for "equivalence classes created". If the denominator of Xthe number shown is 128, then by default X.I flex Xis generating 7-bit characters. If it is 256, then the default is X8-bit characters and the X.B -8 Xflag is not required (but may be a good idea to keep the scanner Xspecification portable). Feeding a 7-bit scanner 8-bit characters Xwill result in infinite loops, bus errors, or other such fireworks, Xso when in doubt, use the flag. Note that if equivalence classes Xare used, 8-bit scanners take only slightly more table space than X7-bit scanners (128 bytes, to be exact); if equivalence classes are Xnot used, however, then the tables may grow up to twice their X7-bit size. X.TP X.B -C[efmF] Xcontrols the degree of table compression. X.IP X.B -Ce Xdirects X.I flex Xto construct X.I equivalence classes, Xi.e., sets of characters Xwhich have identical lexical properties (for example, if the only Xappearance of digits in the X.I flex Xinput is in the character class X"[0-9]" then the digits '0', '1', ..., '9' will all be put Xin the same equivalence class). Equivalence classes usually give Xdramatic reductions in the final table/object file sizes (typically Xa factor of 2-5) and are pretty cheap performance-wise (one array Xlook-up per character scanned). X.IP X.B -Cf Xspecifies that the X.I full Xscanner tables should be generated - X.I flex Xshould not compress the Xtables by taking advantages of similar transition functions for Xdifferent states. X.IP X.B -CF Xspecifies that the alternate fast scanner representation (described Xabove under the X.B -F Xflag) Xshould be used. X.IP X.B -Cm Xdirects X.I flex Xto construct X.I meta-equivalence classes, Xwhich are sets of equivalence classes (or characters, if equivalence Xclasses are not being used) that are commonly used together. Meta-equivalence Xclasses are often a big win when using compressed tables, but they Xhave a moderate performance impact (one or two "if" tests and one Xarray look-up per character scanned). X.IP XA lone X.B -C Xspecifies that the scanner tables should be compressed but neither Xequivalence classes nor meta-equivalence classes should be used. X.IP XThe options X.B -Cf Xor X.B -CF Xand X.B -Cm Xdo not make sense together - there is no opportunity for meta-equivalence Xclasses if the table is not being compressed. Otherwise the options Xmay be freely mixed. X.IP XThe default setting is X.B -Cem, Xwhich specifies that X.I flex Xshould generate equivalence classes Xand meta-equivalence classes. This setting provides the highest Xdegree of table compression. You can trade off Xfaster-executing scanners at the cost of larger tables with Xthe following generally being true: X.nf X X slowest & smallest X -Cem X -Cm X -Ce X -C X -C{f,F}e X -C{f,F} X fastest & largest X X.fi XNote that scanners with the smallest tables are usually generated and Xcompiled the quickest, so Xduring development you will usually want to use the default, maximal Xcompression. X.IP X.B -Cfe Xis often a good compromise between speed and size for production Xscanners. X.IP X.B -C Xoptions are not cumulative; whenever the flag is encountered, the Xprevious -C settings are forgotten. X.TP X.B -Sskeleton_file Xoverrides the default skeleton file from which X.I flex Xconstructs its scanners. You'll never need this option unless you are doing X.I flex Xmaintenance or development. X.SH PERFORMANCE CONSIDERATIONS XThe main design goal of X.I flex Xis that it generate high-performance scanners. It has been optimized Xfor dealing well with large sets of rules. Aside from the effects Xof table compression on scanner speed outlined above, Xthere are a number of options/actions which degrade performance. These Xare, from most expensive to least: X.nf X X REJECT X X pattern sets that require backtracking X arbitrary trailing context X X '^' beginning-of-line operator X yymore() X X.fi Xwith the first three all being quite expensive and the last two Xbeing quite cheap. X.LP X.B REJECT Xshould be avoided at all costs when performance is important. XIt is a particularly expensive option. X.LP XGetting rid of backtracking is messy and often may be an enormous Xamount of work for a complicated scanner. In principal, one begins Xby using the X.B -b Xflag to generate a X.I lex.backtrack Xfile. For example, on the input X.nf X X %% X foo return TOK_KEYWORD; X foobar return TOK_KEYWORD; X X.fi Xthe file looks like: X.nf X X State #6 is non-accepting - X associated rule line numbers: X 2 3 X out-transitions: [ o ] X jam-transitions: EOF [ \\001-n p-\\177 ] X X State #8 is non-accepting - X associated rule line numbers: X 3 X out-transitions: [ a ] X jam-transitions: EOF [ \\001-` b-\\177 ] X X State #9 is non-accepting - X associated rule line numbers: X 3 X out-transitions: [ r ] X jam-transitions: EOF [ \\001-q s-\\177 ] X X Compressed tables always backtrack. X X.fi XThe first few lines tell us that there's a scanner state in Xwhich it can make a transition on an 'o' but not on any other Xcharacter, and that in that state the currently scanned text does not match Xany rule. The state occurs when trying to match the rules found Xat lines 2 and 3 in the input file. XIf the scanner is in that state and then reads Xsomething other than an 'o', it will have to backtrack to find Xa rule which is matched. With Xa bit of headscratching one can see that this must be the Xstate it's in when it has seen "fo". When this has happened, Xif anything other than another 'o' is seen, the scanner will Xhave to back up to simply match the 'f' (by the default rule). X.LP XThe comment regarding State #8 indicates there's a problem Xwhen "foob" has been scanned. Indeed, on any character other Xthan a 'b', the scanner will have to back up to accept "foo". XSimilarly, the comment for State #9 concerns when "fooba" has Xbeen scanned. X.LP XThe final comment reminds us that there's no point going to Xall the trouble of removing backtracking from the rules unless Xwe're using X.B -f Xor X.B -F, Xsince there's no performance gain doing so with compressed scanners. X.LP XThe way to remove the backtracking is to add "error" rules: X.nf X X %% X foo return TOK_KEYWORD; X foobar return TOK_KEYWORD; X X fooba | X foob | X fo { X /* false alarm, not really a keyword */ X return TOK_ID; X } X X.fi X.LP XEliminating backtracking among a list of keywords can also be Xdone using a "catch-all" rule: X.nf X X %% X foo return TOK_KEYWORD; X foobar return TOK_KEYWORD; X X [a-z]+ return TOK_ID; X X.fi XThis is usually the best solution when appropriate. X.LP XBacktracking messages tend to cascade. XWith a complicated set of rules it's not uncommon to get hundreds Xof messages. If one can decipher them, though, it often Xonly takes a dozen or so rules to eliminate the backtracking (though Xit's easy to make a mistake and have an error rule accidentally match Xa valid token. A possible future X.I flex Xfeature will be to automatically add rules to eliminate backtracking). X.LP X.I Variable Xtrailing context (where both the leading and trailing parts do not have Xa fixed length) entails almost the same performance loss as X.I REJECT X(i.e., substantial). So when possible a rule like: X.nf X X %% X mouse|rat/(cat|dog) run(); X X.fi Xis better written: X.nf X X %% X mouse/cat|dog run(); X rat/cat|dog run(); X X.fi Xor as X.nf X X %% X mouse|rat/cat run(); X mouse|rat/dog run(); X X.fi XNote that here the special '|' action does X.I not Xprovide any savings, and can even make things worse (see X.B BUGS Xin flex(1)). X.LP XAnother area where the user can increase a scanner's performance X(and one that's easier to implement) arises from the fact that Xthe longer the tokens matched, the faster the scanner will run. XThis is because with long tokens the processing of most input Xcharacters takes place in the (short) inner scanning loop, and Xdoes not often have to go through the additional work of setting up Xthe scanning environment (e.g., X.B yytext) Xfor the action. Recall the scanner for C comments: X.nf X X %x comment X %% X int line_num = 1; X X "/*" BEGIN(comment); X X <comment>[^*\\n]* X <comment>"*"+[^*/\\n]* X <comment>\\n ++line_num; X <comment>"*"+"/" BEGIN(INITIAL); X X.fi XThis could be sped up by writing it as: X.nf X X %x comment X %% X int line_num = 1; X X "/*" BEGIN(comment); X X <comment>[^*\\n]* X <comment>[^*\\n]*\\n ++line_num; X <comment>"*"+[^*/\\n]* X <comment>"*"+[^*/\\n]*\\n ++line_num; X <comment>"*"+"/" BEGIN(INITIAL); X X.fi XNow instead of each newline requiring the processing of another Xaction, recognizing the newlines is "distributed" over the other rules Xto keep the matched text as long as possible. Note that X.I adding Xrules does X.I not Xslow down the scanner! The speed of the scanner is independent Xof the number of rules or (modulo the considerations given at the Xbeginning of this section) how complicated the rules are with Xregard to operators such as '*' and '|'. X.LP XA final example in speeding up a scanner: suppose you want to scan Xthrough a file containing identifiers and keywords, one per line Xand with no other extraneous characters, and recognize all the Xkeywords. A natural first approach is: X.nf X X %% X asm | X auto | X break | X ... etc ... X volatile | X while /* it's a keyword */ X X .|\\n /* it's not a keyword */ X X.fi XTo eliminate the back-tracking, introduce a catch-all rule: X.nf X X %% X asm | X auto | X break | X ... etc ... X volatile | X while /* it's a keyword */ X X [a-z]+ | X .|\\n /* it's not a keyword */ X X.fi XNow, if it's guaranteed that there's exactly one word per line, Xthen we can reduce the total number of matches by a half by Xmerging in the recognition of newlines with that of the other Xtokens: X.nf X X %% X asm\\n | X auto\\n | X break\\n | X ... etc ... X volatile\\n | X while\\n /* it's a keyword */ X X [a-z]+\\n | X .|\\n /* it's not a keyword */ X X.fi XOne has to be careful here, as we have now reintroduced backtracking Xinto the scanner. In particular, while X.I we Xknow that there will never be any characters in the input stream Xother than letters or newlines, X.I flex Xcan't figure this out, and it will plan for possibly needing backtracking Xwhen it has scanned a token like "auto" and then the next character Xis something other than a newline or a letter. Previously it would Xthen just match the "auto" rule and be done, but now it has no "auto" Xrule, only a "auto\\n" rule. To eliminate the possibility of backtracking, Xwe could either duplicate all rules but without final newlines, or, Xsince we never expect to encounter such an input and therefore don't Xhow it's classified, we can introduce one more catch-all rule, this Xone which doesn't include a newline: X.nf X X %% X asm\\n | X auto\\n | X break\\n | X ... etc ... X volatile\\n | X while\\n /* it's a keyword */ X X [a-z]+\\n | X [a-z]+ | X .|\\n /* it's not a keyword */ X X.fi XCompiled with X.B -Cf, Xthis is about as fast as one can get a X.I flex Xscanner to go for this particular problem. X.LP XA final note: X.I flex Xis slow when matching NUL's, particularly when a token contains Xmultiple NUL's. XIt's best to write rules which match X.I short Xamounts of text if it's anticipated that the text will often include NUL's. X.SH INCOMPATIBILITIES WITH LEX AND POSIX X.I flex Xis a rewrite of the Unix X.I lex Xtool (the two implementations do not share any code, though), Xwith some extensions and incompatibilities, both of which Xare of concern to those who wish to write scanners acceptable Xto either implementation. At present, the POSIX X.I lex Xdraft is Xvery close to the original X.I lex Ximplementation, so some of these Xincompatibilities are also in conflict with the POSIX draft. But Xthe intent is that except as noted below, X.I flex Xas it presently stands will Xultimately be POSIX conformant (i.e., that those areas of conflict with Xthe POSIX draft will be resolved in X.I flex's Xfavor). Please bear in Xmind that all the comments which follow are with regard to the POSIX X.I draft Xstandard of Summer 1989, and not the final document (or subsequent Xdrafts); they are included so X.I flex Xusers can be aware of the standardization issues and those areas where X.I flex Xmay in the near future undergo changes incompatible with Xits current definition. X.LP X.I flex Xis fully compatible with X.I lex Xwith the following exceptions: X.IP - XThe undocumented X.I lex Xscanner internal variable X.B yylineno Xis not supported. It is difficult to support this option efficiently, Xsince it requires examining every character scanned and reexamining Xthe characters when the scanner backs up. XThings get more complicated when the end of buffer or file is reached or a XNUL is scanned (since the scan must then be restarted with the proper line Xnumber count), or the user uses the yyless(), unput(), or REJECT actions, Xor the multiple input buffer functions. X.IP XThe fix is to add rules which, upon seeing a newline, increment Xyylineno. This is usually an easy process, though it can be a drag if some Xof the patterns can match multiple newlines along with other characters. X.IP Xyylineno is not part of the POSIX draft. X.IP - XThe X.B input() Xroutine is not redefinable, though it may be called to read characters Xfollowing whatever has been matched by a rule. If X.B input() Xencounters an end-of-file the normal X.B yywrap() Xprocessing is done. A ``real'' end-of-file is returned by X.B input() Xas X.I EOF. X.IP XInput is instead controlled by redefining the X.B YY_INPUT Xmacro. X.IP XThe X.I flex Xrestriction that X.B input() Xcannot be redefined is in accordance with the POSIX draft, but X.B YY_INPUT Xhas not yet been accepted into the draft (and probably won't; it looks Xlike the draft will simply not specify any way of controlling the Xscanner's input other than by making an initial assignment to X.I yyin). X.IP - X.I flex Xscanners do not use stdio for input. Because of this, when writing an Xinteractive scanner one must explicitly call fflush() on the Xstream associated with the terminal after writing out a prompt. XWith X.I lex Xsuch writes are automatically flushed since X.I lex Xscanners use X.B getchar() Xfor their input. Also, when writing interactive scanners with X.I flex, Xthe X.B -I Xflag must be used. X.IP - X.I flex Xscanners are not as reentrant as X.I lex Xscanners. In particular, if you have an interactive scanner and Xan interrupt handler which long-jumps out of the scanner, and Xthe scanner is subsequently called again, you may get the following Xmessage: X.nf X X fatal flex scanner internal error--end of buffer missed X X.fi XTo reenter the scanner, first use X.nf X X yyrestart( yyin ); X X.fi X.IP - X.B output() Xis not supported. XOutput from the X.B ECHO Xmacro is done to the file-pointer X.I yyout X(default X.I stdout). X.IP XThe POSIX draft mentions that an X.B output() Xroutine exists but currently gives no details as to what it does. X.IP - X.I lex Xdoes not support exclusive start conditions (%x), though they Xare in the current POSIX draft. X.IP - XWhen definitions are expanded, X.I flex Xencloses them in parentheses. XWith lex, the following: X.nf X X NAME [A-Z][A-Z0-9]* X %% X foo{NAME}? printf( "Found it\\n" ); X %% X X.fi Xwill not match the string "foo" because when the macro Xis expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" Xand the precedence is such that the '?' is associated with X"[A-Z0-9]*". With X.I flex, Xthe rule will be expanded to X"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match. XNote that because of this, the X.B ^, $, <s>, /, Xand X.B <<EOF>> Xoperators cannot be used in a X.I flex Xdefinition. X.IP XThe POSIX draft interpretation is the same as X.I flex's. X.IP - XTo specify a character class which matches anything but a left bracket (']'), Xin X.I lex Xone can use "[^]]" but with X.I flex Xone must use "[^\\]]". The latter works with X.I lex, Xtoo. X.IP - XThe X.I lex X.B %r X(generate a Ratfor scanner) option is not supported. It is not part Xof the POSIX draft. X.IP - XIf you are providing your own yywrap() routine, you must include a X"#undef yywrap" in the definitions section (section 1). Note that Xthe "#undef" will have to be enclosed in %{}'s. X.IP XThe POSIX draft Xspecifies that yywrap() is a function and this is very unlikely to change; so X.I flex users are warned Xthat X.B yywrap() Xis likely to be changed to a function in the near future. X.IP - XAfter a call to X.B unput(), X.I yytext Xand X.I yyleng Xare undefined until the next token is matched. This is not the case with X.I lex Xor the present POSIX draft. X.IP - XThe precedence of the X.B {} X(numeric range) operator is different. X.I lex Xinterprets "abc{1,3}" as "match one, two, or Xthree occurrences of 'abc'", whereas X.I flex Xinterprets it as "match 'ab' Xfollowed by one, two, or three occurrences of 'c'". The latter is Xin agreement with the current POSIX draft. X.IP - XThe precedence of the X.B ^ Xoperator is different. X.I lex Xinterprets "^foo|bar" as "match either 'foo' at the beginning of a line, Xor 'bar' anywhere", whereas X.I flex Xinterprets it as "match either 'foo' or 'bar' if they come at the beginning Xof a line". The latter is in agreement with the current POSIX draft. X.IP - XTo refer to yytext outside of the scanner source file, Xthe correct definition with X.I flex Xis "extern char *yytext" rather than "extern char yytext[]". XThis is contrary to the current POSIX draft but a point on which X.I flex Xwill not be changing, as the array representation entails a Xserious performance penalty. It is hoped that the POSIX draft will Xbe emended to support the X.I flex Xvariety of declaration (as this is a fairly painless change to Xrequire of X.I lex Xusers). X.IP - X.I yyin Xis X.I initialized Xby X.I lex Xto be X.I stdin; X.I flex, Xon the other hand, Xinitializes X.I yyin Xto NULL Xand then X.I assigns Xit to X.I stdin Xthe first time the scanner is called, providing X.I yyin Xhas not already been assigned to a non-NULL value. The difference is Xsubtle, but the net effect is that with X.I flex Xscanners, X.I yyin Xdoes not have a valid value until the scanner has been called. X.IP - XThe special table-size declarations such as X.B %a Xsupported by X.I lex Xare not required by X.I flex Xscanners; X.I flex Xignores them. X.IP - XThe name X.bd XFLEX_SCANNER Xis #define'd so scanners may be written for use with either X.I flex Xor X.I lex. X.LP XThe following X.I flex Xfeatures are not included in X.I lex Xor the POSIX draft standard: X.nf X X yyterminate() X <<EOF>> X YY_DECL X #line directives X %{}'s around actions X yyrestart() X comments beginning with '#' (deprecated) X multiple actions on a line X X.fi XThis last feature refers to the fact that with X.I flex Xyou can put multiple actions on the same line, separated with Xsemi-colons, while with X.I lex, Xthe following X.nf X X foo handle_foo(); ++num_foos_seen; X X.fi Xis (rather surprisingly) truncated to X.nf X X foo handle_foo(); X X.fi X.I flex Xdoes not truncate the action. Actions that are not enclosed in Xbraces are simply terminated at the end of the line. X.SH DIAGNOSTICS X.I reject_used_but_not_detected undefined Xor X.I yymore_used_but_not_detected undefined - XThese errors can occur at compile time. They indicate that the Xscanner uses X.B REJECT Xor X.B yymore() Xbut that X.I flex Xfailed to notice the fact, meaning that X.I flex Xscanned the first two sections looking for occurrences of these actions Xand failed to find any, but somehow you snuck some in (via a #include Xfile, for example). Make an explicit reference to the action in your X.I flex Xinput file. (Note that previously X.I flex Xsupported a X.B %used/%unused Xmechanism for dealing with this problem; this feature is still supported Xbut now deprecated, and will go away soon unless the author hears from Xpeople who can argue compellingly that they need it.) X.LP X.I flex scanner jammed - Xa scanner compiled with X.B -s Xhas encountered an input string which wasn't matched by Xany of its rules. X.LP X.I flex input buffer overflowed - Xa scanner rule matched a string long enough to overflow the Xscanner's internal input buffer (16K bytes by default - controlled by X.B YY_BUF_SIZE Xin "flex.skel". Note that to redefine this macro, you must first X.B #undefine Xit). X.LP X.I scanner requires -8 flag - XYour scanner specification includes recognizing 8-bit characters and Xyou did not specify the -8 flag (and your site has not installed flex Xwith -8 as the default). X.LP X.I Xfatal flex scanner internal error--end of buffer missed - XThis can occur in an scanner which is reentered after a long-jump Xhas jumped out (or over) the scanner's activation frame. Before Xreentering the scanner, use: X.nf X X yyrestart( yyin ); X X.fi X.LP X.I too many %t classes! - XYou managed to put every single character into its own %t class. X.I flex Xrequires that at least one of the classes share characters. X.SH DEFICIENCIES / BUGS XSee flex(1). X.SH "SEE ALSO" X.LP Xflex(1), lex(1), yacc(1), sed(1), awk(1). X.LP XM. E. Lesk and E. Schmidt, X.I LEX - Lexical Analyzer Generator X.SH AUTHOR XVern Paxson, with the help of many ideas and much inspiration from XVan Jacobson. Original version by Jef Poskanzer. The fast table Xrepresentation is a partial implementation of a design done by Van XJacobson. The implementation was done by Kevin Gong and Vern Paxson. X.LP XThanks to the many X.I flex Xbeta-testers, feedbackers, and contributors, especially Casey XLeedom, benson@odi.com, Keith Bostic, XFrederic Brehm, Nick Christopher, Jason Coughlin, XScott David Daniels, Leo Eskin, XChris Faylor, Eric Goldman, Eric XHughes, Jeffrey R. Jones, Kevin B. Kenny, Ronald Lamprecht, XGreg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Marc Nozell, Esmond Pitt, XJef Poskanzer, Jim Roskind, XDave Tallman, Frank Whaley, Ken Yap, and those whose names Xhave slipped my marginal mail-archiving skills but whose contributions Xare appreciated all the same. X.LP XThanks to Keith Bostic, John Gilmore, Craig Leres, Bob XMulcahy, Rich Salz, and Richard Stallman for help with various distribution Xheadaches. X.LP XThanks to Esmond Pitt and Earle Horton for 8-bit character support; Xto Benson Margulies and Fred XBurke for C++ support; to Ove Ewerlid for the basics of support for XNUL's; and to Eric Hughes for the basics of support for multiple buffers. X.LP XWork is being done on extending X.I flex Xto generate scanners in which the Xstate machine is directly represented in C code rather than tables. XThese scanners may well be substantially faster than those generated Xusing -f or -F. If you are working in this area and are interested Xin comparing notes and seeing whether redundant work can be avoided, Xcontact Ove Ewerlid (ewerlid@mizar.DoCS.UU.SE). X.LP XThis work was primarily done when I was at the Real Time Systems Group Xat the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there Xfor the support I received. X.LP XSend comments to: X.nf X X Vern Paxson X Computer Science Department X 4126 Upson Hall X Cornell University X Ithaca, NY 14853-7501 X X vern@cs.cornell.edu X decvax!cornell!vern X X.fi END_OF_FILE if test 65353 -ne `wc -c <'flexdoc.1'`; then echo shar: \"'flexdoc.1'\" unpacked with wrong size! fi # end of 'flexdoc.1' fi echo shar: End of archive 11 $of 13$. cp /dev/null ark11isdone MISSING="" for I in 1 2 3 4 5 6 7 8 9 10 11 12 13 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 13 archives. rm -f ark[1-9]isdone ark[1-9][0-9]isdone else echo You still need to unpack the following archives: echo " " ${MISSING} fi ## End of shell archive. exit 0 -- Mail submissions (sources or binaries) to <amiga@uunet.uu.net>. Mail comments to the moderator at <amiga-request@uunet.uu.net>. Post requests for sources, and general discussion to comp.sys.amiga.